A Fast and Efficient Algorithm for Finding Frequent Items over Data Stream
نویسندگان
چکیده
We investigate the problem of finding the frequent items in a continuous data stream. We present an algorithm called λ-Count for computing frequency counts over a user specified threshold on a data stream. To emphasize the importance of the more recent data items, a fading factor is used. Our algorithm can detect εapproximate frequent items of a data stream using O(logλε) memory space and O(1) time to process each data record. The computation time for answering each query is O( lo g ), and for answering a query about the frequentness of a given data item is O(1). Experimental study shows that λ-Count outperforms other methods in terms of accuracy, memory requirement, and processing speed.
منابع مشابه
Mining frequent items in a stream using flexible windows
We study the problem of finding frequent items in a continuous stream of itemsets. A new frequency measure is introduced, based on a flexible window length. For a given item, its current frequency in the stream is defined as the maximal frequency over all windows from any point in the past until the current state. We study the properties of the new measure, and propose an incremental algorithm ...
متن کاملFinding frequent items in data streams
The frequent items problem is to process a stream of items and find all items occurring more than a given fraction of the time. It is one of the most heavily studied problems in data stream mining, dating back to the 1980s. Many applications rely directly or indirectly on finding the frequent items, and implementations are in use in large scale industrial systems. However, there has not been mu...
متن کاملTime and Space Complexity Reduction of a Cryptanalysis Algorithm
Binary Decision Diagram (in short BDD) is an efficient data structure which has been used widely in computer science and engineering. BDD-based attack in key stream cryptanalysis is one of the best forms of attack in its category. In this paper, we propose a new key stream attack which is based on ZDD(Zero-suppressed BDD). We show how a ZDD-based key stream attack is more efficient in time and ...
متن کاملFinding Frequent Items in Data Streams
We present a 1-pass algorithm for estimating the most frequent items in a data stream using very limited storage space. Our method relies on a novel data structure called a count sketch, which allows us to estimate the frequencies of all the items in the stream. Our algorithm achieves better space bounds than the previous best known algorithms for this problem for many natural distributions on ...
متن کاملFinding Frequent Items over General Update Streams
We present novel space and time-efficient algorithms for finding frequent items over general update streams. Our algorithms are based on a novel adaptation of the popular dyadic intervals method for finding frequent items. The algorithms improve upon existing algorithms in both theory and practice.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCP
دوره 7 شماره
صفحات -
تاریخ انتشار 2012